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ABSTRACT : 

The  previous  study  in  this  series  showed  that  evaluation  of 
R&D  activities  rests  eventually  on  qualitative  judgments. 
The  purpose  of  this  study  was  to  develop,  validate,  and  test 
apply  a  procedure  for  obtaining  qualitative  judgments  econom- 
ically and  efficiently.   The  Ford  procedure  for  scaling  par- 
tially ordered  sets  of  rankings  was  programmed  and  validated 
using  an  abstract  judgmental  task  with  an  extrinsic  criterion. 
It  was  given  a  trial  application  requiring  the  ordering  on 
merit  of  current  personnel  research  projects.   Both  validation 
and  trial  application  results  were  highly  satisfactory.   It 
was  concluded  that  the  Ford  procedure  could  be  used  to  obtain 
scaled  qualitative  judgments  in  a  wide  variety  of  settings 
with  accuracy,  efficiency,  and  economy.   Flow  charts,  data 
setup,  and  the  complete  computer  program  are  given. 

This  research  was  supported  in  part  by  the  Personnel  Research 
Division,  Bureau  of  Naval  Personnel,  through  Project  Order  No. 
1-0001,  Naval  Personnel  Research  and  Development  Laboratory. 
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PREFACE 

This  report  is  the  second  in  a  research  project  between  the 
sponsoring  activity,  the  Personnel  Research  Division  of  the  Bureau 
of  Naval  Personnel,  and  the  Naval  Postgraduate  School.   The  study  was 
performed  under  the  auspices  of  Capt.  G.  F.  Britner,  Division  Direc- 
tor, and  Mr.  A.  A.  Sjoholm,  Technical  Director,  Personnel  Research 
Division. 

We  would  like  to  express  our  thanks  to  Dr.  Frank  M.  Andrews, 
Survey  Research  Center,  Institute  for  Social  Research,  University  of 
Michigan,  for  providing  a  copy  of  the  Michigan  Ford  Program  on  which 
much  of  this  work  was  based. 

Portions  of  this  work  were  done  for  a  master's  thesis  in  opera- 
tions research  by  the  junior  author  under  the  direction  of  the  senior 
author. 

Various  aspects  of  this  work  were  presented  at  the  Research 
and  Development  Working  Group,  28th  Military  Operations  Research  Sym- 
posium, Ft.  Lee,  Va.,  in  November  1971,  and  at  the  XlXth  International 
Meeting  of  The  Institute  for  Management  Sciences,  Houston,  Texas,  in 
April  1972.   The  distribution  list  reflects  the  requests  for  this 
paper  as  a  result  of  these  presentations.   It  is  hoped  that  recipients 
of  this  report  will  find  it  useful  in  the  many  different  contexts  of 
research  indicated  by  their  addresses  and  positions. 
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BRIEF 

The  previous  study  in  this  series  showed  that  there  are  no  gen- 
erally applicable,  hard  measures  of  the  effectiveness  of  an  R&D  labor- 
atory's activities  and  operations.   The  basis  for  determining  effective- 
ness of  a  laboratory  eventually  narrows  down  to  the  judgments  of  per- 
sons who,  for  various  reasons,  are  deemed  qualified  to  make  such  judg- 
ments . 

This  being  the  case,  it  follows  that  the  evaluation  process  can 
be  improved  by  developing  practical  methods  for  obtaining  and  process- 
ing judgments  that  are  simple  to  apply,  broadly  applicable,  and  faith- 
fully reflect  the  contribution  of  each  judge.   Ideally,  the  results 
should  be  expressed  quantitatively  to  permit  their  use  in  conjunction 
with  other  statistical  and  mathematical  tools. 

To  have  these  characteristics,  a  method  should  permit  an  indi- 
vidual judge,  faced  with  a  set  of  alternatives  to  "prioritize",  to  rate 
only  those  with  which  he  is  familiar,  to  set  his  own  measurement  scale, 
and  to  make  use  of  ties  when  he  sees  no  difference  between  alternatives. 
The  Ford  procedure  permits  a  judge  to  behave  in  this  manner.   It  was 
originally  programmed  for  computer  application  by  the  Survey  Research 
Center,  University  of  Michigan.   The  program  was  obtained  and  adapted 
for  use  on  the  computing  facilities  of  the  Naval  Postgraduate  School 
(NPS)  which  uses  an  IBM  360/67  system.   The  program,  along  with  explan- 
atory instructions,  is  reproduced  in  this  report. 

To  prove  the  Ford  program  was  broadly  applicable  and  effective, 
a  validation  test  was  conducted  using  an  abstract,  vague,  rating  task 
for  which  there  was — unknown  to  the  judges — an  independent  set  of  quan- 
titative "truth"  data  for  comparison.   Next,  a  trial  application  of  the 
program  was  made  in  which  Navy  officers  rated  current  personnel  research 
projects  as  to  the  advisability  of  retaining  and  pursuing  them  in  the 
R&D  program.   Finally,  the  Ford  procedure  was  used  in  a  real-life  situa- 
tion to  analyze  student  ratings  of  courses  in  the  NPS  operations  research 
program.   The  Ford  rating  procedure  and  NPS  computer  program  were  highly 
satisfactory  in  all  of  these  test  applications. 

It  was  concluded  that  a  simple,  effective,  and  broadly  useful 
procedure  for  obtaining  and  scaling  the  evaluative  opinions  of  judges 
had  been  developed,  tested,  and  applied.   The  suggestion  was  made  to 
use  the  procedures  to  analyze  project  selection  in  the  Navy's  personnel 
research  laboratories,  since  it  is  widely  recognized  that,  for  a  labora- 
tory to  be  effective,  it  must  be  working  on  the  right  programs  at  the 
right  time. 
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I.   PURPOSE  AND  SCOPE 

The  previous  study  in  this  series  (Arima,  19  71)  discussed  var- 
ious factors  associated  with  the  effectiveness  of  Federal  in-house 
laboratories.   The  problem  of  evaluating  the  effectiveness  of  a  spe- 
cific laboratory,  such  as  the  Navy's  personnel  research  laboratories, 
was  of  special  interest.   Approaches  to  this  evaluation  problem  seemed 
ultimately  to  require  a  qualitative  assessment  of  a  laboratory's  ef- 
fectiveness or  some  aspect  of  its  operations  by  knowledgeable  indi- 
viduals.  Accordingly,  one  specific  problem  identified  as  a  result  of 
the  preliminary  study  was  to  develop  and  test  a  method  for  obtaining 
and  analyzing  such  assessments  from  qualified  judges  in  an  economic, 
convenient,  and  effective  manner.   This  report  addresses  itself  to 
this  problem. 

The  approach  taken  to  solve  the  problem,  explicated  in  the 
pages  that  follow,  was:   (1)  Adapt  Ford's  (1957)  procedure,  as  pro- 
grammed by  Pelz  and  Andrews  (1966) ,  for  creating  numerical  rankings 
from  a  set  of  incomplete  comparisons  of  objects  by  a  group  of  judges 
to  operate  on  the  Naval  Postgraduate  School's  IBM  360/67  system,   (2) 
validate  the  procedures  using  an  arbitrary  task  with  an  extrinsic  cri- 
terion measure,  and   (3)  test  the  feasibility  of  using  the  procedures 
to  obtain  an  ordered  set  of  qualitative  judgments  on  an  R&D  problem 
appropriate  to  the  environment  and  mission  of  the  Navy's  personnel 
research  laboratories. 


II.  THE  FORD  PROCEDURE 

A.   FLEXIBILITY  OF  PROCEDURES. 

There  are  three  characteristics  of  Ford's  procedure  that  make 
it  especially  appropriate  for  obtaining  judgments  on  several  alterna- 
tives or  items  from  a  diverse  group  of  judges.   First,  a  judge  or 
rater  adjudicates  only  those  items  that  he  feels  competent  to  judge. 
Second,  he  can  make  his  judgments  as  coarse  or  as  fine  as  he  desires 
because  there  is  no  restriction  on  how  many  judgmental  categories  he 
must  use.   And  third,  there  is  no  requirement  for  a  fixed  distribution 
of  items  among  the  categories,  except  that,  collectively  over  judges, 
no  more  than  one  third  of  all  items  being  rated  should  be  in  any  one 
category.   A  judge,  for  example,  might  decide  to  judge  only  half  of  a 
pool  of  items  using  three  categories — high,  medium  and  low. 

The  ease  of  this  method  can  be  compared  with  other  frequently 
used  methods  that  may  require  one  or  more  of  the  following  restric- 
tions:  all  items  must  be  ranked  with  no  ties,  each  items  is  to  be  com- 
pared with  every  other  item  with  no  indeterminate  category  permitted, 
an  equal  number  of  items  must  be  placed  in  each  rating  category,  and 
so  forth.   Such  restrictions  are  usually  imposed  because  of  statisti- 
cal considerations  in  the  analytical  procedures.   Unfortunately, 
persons  who  are  unfamiliar  with  the  statistical  considerations  are 
alienated  against  the  results  of  the  procedures  because,  while  serv- 
ing as  judges,  they  had  to  make  too  many  arbitrary  decisions  in  which 
they  felt  no  confidence.   A  more  serious  consequence  of  such  proce- 
dures is  the  fact  that  a  large  amount  of  noise  might  be  added  to  the 


judgments  so  that  the  "signal"  present  in  the  judgments  cannot  be  dis- 
criminated.  Moreover,  some  of  the  techniques,  such  as  paired  compari- 
sons, are  excessively  demanding  of  a  judge's  time.   Thus,  the  statis- 
tical rigor  is  offset  by  serious  negative  consequences  of  the  proce- 
dures involved. 

At  this  time,  it  should  be  noted  that  the  procedures  being  de- 
veloped here  are  not  the  same  as  those  designed  to  achieve  a  consensus 
or  decision  among  a  group  of  judges,  such  as  some  applications  of  the 
Delphi  technique.   These  procedures  tend  to  be  used  when  the  number  of 
alternatives  and  judges  are  few,  when  any  of  the  alternatives  are  rea- 
sonable choices,  and  when  the  problem  is  one  of  reaching  consensus 
rather  than  evaluating  the  relative  merit  of  the  alternatives.   The 
procedures  tend  to  disregard  the  contribution  of  the  individual  and 
depend  on  devious  group  processes  and  feedback  to  eliminate,  eventually, 
any  individuality  not  consonant  with  the  prevailing  group  trend.   It 
should  be  pointed  out  that  there  is  no  way  to  determine  to  what  extent 
the  final  decision  is  based  on  the  relative  merits  of  the  items  enter- 
ing into  the  decision  and  on  the  group  processes  employed  in  arriving 
at  a  consensus.   The  procedures  being  developed  here,  on  the  other  hand, 
produce  a  composite  judgment  that  reflects  the  contribution  of  each 
judge  according  to  the  proportionate  number  of  judgments  he  makes.   The 
results  of  the  procedure  do  not,  however,  produce  a  clear-cut  decision 
or  unanimity  of  opinion.   Other  factors  and  other  methods  must  be  em- 
ployed for  the  decision-making  process  using  the  composite  judgments 
as  a  data  base.   Bartee  (1971),  for  example,  suggests  a  linear  program- 


ming  approach  with  zero-one  variables.   In  many  cases,  however,  the 
scaled  alternatives  might  be  an  end  in  themselves  with  actions  taking 
on  priorities  according  to  their  scaled  values. 

B.   DETAILS  OF  THE  FORD  PROCEDURE 

The  Ford  procedure  is  based  on  forming  a  win-loss  matrix, 
A  =  (a.  .),   where   a-  .   represents  the  number  of  times  object   i   is 
preferred  over  object   j   by  the  judges,  and   a..  =  0.   Moreover,  all 
ties  and  nonjudged  items  do  not  enter  the  matrix  for  any  one  judge 
since  a  win-loss  determination  has  not  been  made.   Thus,  each  judge 
contributes  to  the  composite  judgment  only  those  pairwise  instances 
in  which  he  has  preferred  one  alternative  over  another.   The  Ford  pro- 
cedure then  determines  a  weight,   w.,   for  each  item.   These  weights 
are  interpreted  as  odds  in  the  sense  that  the  probability  of  item  i 
being  preferred  to  item  j   in  any  comparison  is  taken  to  be  w./(w.  +  w.) 
These  probabilities  could  then  be  used  to  compute  matrix  A.   The  set 
of  these  weights  is  the  maximum  likelihood  of  obtaining  the  original 
matrix,  A.   The  weights  are  obtained  by  solving  iteratively  the  equation 

n 
n+1    j    J ,,, 

w.    =  -^ (1) 

1      r  a- •  +  a • • 
L   _JJ II 

j   n     n 

w  .   +  w  . 

i      J 
where  a..  =  number  of  times  object   i  was  preferred  to  object  j; 

a..  =  number  of  times  object   j   was  preferred  to  object   i;   w.  = 

•  i  ^     -i     i  .      .      ,     th   .      .  n      .  . 

weight  assigned  to  object   l   on  the  n    iteration;  and  w.  =  weight 

assigned  to  object   j   on  the   n    iteration.   The  weights  are  win 


percentages  on  the  first  iteration.   The  iteration  stops  in  the  com- 
puter program  when  a  predetermined  convergence  criterion  is  reached  or 
a  predetermined  number  of  iterations  has  been  completed. 

There  was  one  assumption  in  Ford's  procedure  that  made  it  dif- 
ficult to  apply  in  practice.   This  was  a  partition  assumption  which 

stated  that  in  any  partition  of  the  win-loss  matrix  into  two  nonempty 

subsets,  some  item  in  each  subset  had  to  be  preferred  at  least  once  to 

some  item  in  the  other  subset.   That  is,  the  initial  w.   and  w.   could 

i        J 

not  be   1   and   0   in  equation  (1) .   This  rule  would  be  broken  in  the 
case  of  universally  high  and  universally  low  alternatives  and  in  any 
subset  where  all  judgments  are  in  one  direction.   Pelz  and  Andrews 
(1966)  solved  this  problem  by  first  removing  universally  high  and  low 
items  from  the  win-loss  matrix  before  computing  the  weights  and  by 
adding  a  very  small  constant,  .00001,  to  each  of  the  remaining  entries 
in  the  matrix.   These  procedures  permitted  them  to  program  Ford's  pro- 
cedures for  computer  processing  of  judgments  involving  130  judges  and 
130  items.   Accordingly,  the  Pelz  and  Andrews  program  was  used  as  a 
starting  point  for  adapting  Ford's  procedure  to  the  Naval  Postgraduate 
School's  IBM  360/67  system.   The  program  as  adapted  for  the  IBM  360/67 
system  will  hereafter  be  referred  to  as  the  Ford  program. 

C.   THE  FORD  PROGRAM 

A  flow-chart  of  the  program  is  included  at  Appendix  I.   The  data 
assembly  for  input  to  the  program  is  shown  in  Appendix  II.   The  program, 
itself,  with  explanatory  comments  is  reproduced   at  Appendix  III. 


Two  decisions  are  required  by  the  person  using  the  program. 
First,  he  must  specify  the  convergence  criterion  for  the  iterative  de- 
termination of  the  weights.   This  report  uses  .005.   That  is,  when  the 
weights  do  not  change  by  that  amount  in  successive  iterations,  a  satis- 
factory stabilization  of  the  weights  is  accepted.   Second,  the  user 
must  specify  how  many  iterations  are  to  be  conducted  in  the  event  the 
convergence  criterion  is  not  reached.   This  reports  uses  50.   As  will 
be  shown,  the  rank  ordering  of  the  items,  as  determined  from  their 
weights,  stabilizes  rapidly.   Accordingly,  even  if  the  convergence  cri- 
terion is  not  met,  the  rank  ordering  is  acceptable.   When  the  conver- 
gence criterion  is  met,  the  weights  can  be  used  as  an  interval  scaling 
of  the  judged  items. 

The  program  operates  in  three  subroutines  or  cores.   The  first 
core  assigns  an  ID  number  (hereafter  called  "assigned  ID  number")  to 
each  rated  alternative  as  it  is  read  into  the  computer  and  them  com- 
putes how  many  comparisons  are  to  be  made  between  pairs  of  alternatives, 
excluding  ties. 

The  second  core  forms  the  win-loss  matrix,  eliminates  universal 
highs  and  lows,  assigns  the  small  constant  to  each  cell,  and  then  com- 
putes the  initial  weights. 

The  third  core  performs  the  iterations  until  the  weights  stab- 
ilize or  until  the  specified  number  of  iterations  have  been  run.   The 
results  are  printed  out  showing  a  list  of  judges  and  the  number  of 
comparisons  made.   The  output  gives  a  mapping  of  the  assigned  ID  numbers 
to  the  original  numbers  used  for  input  of  the  variables.   The  win-loss 


matrix  is  shown  by  assigned  ID  number.   Finally,  there  is  a  printout 
of  the  weights  by  iterations  and  a  list  of  final  weights  shown  by  as- 
signed ID  number  and  giving  the  corresponding  original  ID  number. 

III.   VALIDATION  OF  THE  FORD  PROGRAM 

A.   THE  VALIDATION  PROBLEM 

Pelz  and  Andrews  (1966)  showed  some  comparisons  of  the  Ford  pro- 
cedure with  alternative  methods  for  scaling  partially  ordered  judgments. 
Having  shown  the  computational  advantages  of  the  Ford  procedure,  they 
then  demonstrated  its  utility  in  their  evaluation  of  scientists  in  or- 
ganizations.  They  did  this  by  having  laboratory  directors  rate  their 
scientists  as  to  their  excellence  in  scientific  research  using  the  Ford 
procedure.   These  ratings  were  then  scaled  and  used  as  the  criterion 
variable  in  their  studies.   It  should  be  noted,  however,  that  the  valid- 
ity of  these  ratings  was  not  established  in  a  psychometric  sense  (Amer- 
ican Psychological  Association,  1954),  other  than  that  of  face  validity, 
That  is,  they  were  not  subjected  to  a  critical  comparison  against  some 
outside  criterion. 

Among  the  other  forms  of  validity — concurrent,  predictive,  and 
construct — concurrent  validity  of  the  scaled  judgments  would  be  of 
most  interest  when  the  judgments  are  to  be  used  as  a  criterion  measure, 
dependent  variable,  objective  function,  or,  in  general,  a  measure  of 
effectiveness.   That  is,  we  would  like  to  know  how  well  the  judgments 
represent  the  true  state  of  the  world  that  they  are  presumed  to  repre- 


sent.   This  is  particularly  true  when,  as  in  the  case  of  the  Ford  pro- 
cedure, judgments  which  are  ordinal  in  nature  are  mapped  to  the  system 
of  real  numbers  and  used  as  a  cardinal  measure.   For  the  application 
made  by  Pelz  and  Andrews,  we  would  like  to  know  how  accurately  the 
scaled  ratings  represent  the  true  effectiveness  of  the  rated  scientists, 
Stated  in  this  form,  the  difficulty  or  impossibility  of  assessing  the 
concurrent  validity  of  the  scaled  ratings  becomes  readily  apparent: 
judgments  of  this  type  are  used  because  there  is  no  other  acceptable 
measure  of  the  variable  in  which  interest  lies. 

In  view  of  the  foregoing,  it  follows  that  an  existing,  scaled 
variable  is  needed  to  validate  the  Ford  program.   In  its  simplest  form, 
validation  might  take  on  the  paradigm  of  a  psychophysical  experiemnt. 
For  example,  a  set  of  standard  weights  might  be  presented  to  judges 
with  the  task  of  rating  the  relative  heaviness  of  the  weights.   There 
would  be  little  interest  in  such  a  test  of  the  Ford  procedure,  since 
it  would  be  a  straightforward  evaluation  of  a  numerical  estimation  func- 
tion as  the  size  of  the  weights  vary.   In  a  validation  of  the  Ford  pro- 
cedure, interest  lies  in  the  nature  of  the  underlying  quality  of  pairs 
of  objects  as  they  are  judged  and  what  the  relationship  is  of  the  per- 
ceived quality  to  the  decisions  of  the  judges.   This  distinction  in 
emphasis  is  elaborated  in  detail  by  Krantz  (1972).   The  test  in  a 
psychophysical  paradigm  might  be  more  relevant,  for  example,  if  the 
judges  had  to  rate  the  weights  of  objects  differing  considerably  in 
size  and  mass.   Thus,  an  ideal  validation  of  the  Ford  procedure  would 


take  place  if  judges  were  to  rate  items  according  to  an  abstract  or 
vague  variable  for  which  there  is,  unknown  to  them,  a  corresponding 
quantitative,  objective  variable  that  could  serve  as  a  criterion  mea- 
sure.  Unfortunately,  the  more  vague  or  abstract  a  judging  task  becomes, 
the  more  difficult  it  is  to  find  a  criterion  variable  that  is  also  not 
equally  vague.   Accordingly,  validation  of  the  Ford  procedure  with  a 
challenging  task  will  be  less  than  rigorous  and  any  discrepancy  of  the 
resulting  scaled  judgments  from  the  criterion  values  may  be  due  to 
several  factors  which  will  not  be  independently  assessable.   These  in- 
clude the  difficulty  of  the  judgmental  task,  the  capability  of  the 
judges,  the  reliability  of  the  criterion  variable,  and  the  efficiency 
of  the  Ford  program.   The  validation,  then,  will  be  clinical,  and  hope- 
fully diagnostic,  while  attempting  to  be  rigorous. 

B .   METHOD 

1.   Stimulus  Materials. 

Fortunately,  there  is  a  situation  that  compares  favorably  with 
the  ideal  validation  paradigm  mentioned  above.   It  has  been  found  that 
such  abstract  characteristics  or  qualities  of  words  as  their  familiarity, 
meaningfulness ,  and  associational  richness  are  closely  related  to  the 
frequency  with  which  they  appear  in  the  English  language  (Broadbent, 
1967;  Ekstrand,  Wallace,  &  Underwood,  1966;  Underwood,  1966).   Fortu- 
nately, too,  the  frequency  of  30,000  words  has  been  cataloged  in  what 
has  become  known  as  the  Thorndike  and  Lorge  (1944)  word  count.   Now,  it 
can  be  assumed  that  most  individuals  are  not  consciously  aware  of  the 
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fact  that  familiarity,  say,  of  English  words  depends  on  their  frequency. 

irdingly,  it  should  be  possible  to  ask  judges  to  rate  a  list  of  se- 
lected words  from  the  Thorndike  and  Lorge  word  count  for  their  famil- 
iarity to  persons  in  general  and  compare  the  Ford-scaled  ratings  with 
the  Thorndike  and  Lorge  word  count,  thus  completing  the  validation. 
Rather  than  selecting  words  directly  from  the  Thorndike  and 
Lorge  word  count,  an  intermediate  procedure  was  inserted  to  provide 
some  structure  to  the  judging  task  and  to  make  possible  four  replica- 
tions of  the  judging  procedure.   The  words  were  actually  taken  from 
the  category  norms  for  verbal  items  compiled  by  Battig  and  Montague 
(1969).   Their  norms  are  based  on  the  primacy  and  frequency  with  which 
students  at  two  large  universities  provided  verbal  associations  for  56 
different  verbal  categories,  such  as  a  precious  stone,  a  unit  of  time, 
and  so  forth.   Four  of  these  categories  were  chosen  from  which  to  se- 
lect words  based  on  the  fact  that  there  was  a  correlation  of  .90  or 
greater  between  the  two  universities  and  that  there  was  a  long  enough 
iist  of  associations  from  which  selections  could  be  made,  graded  for 
their  frequency  in  the  Thorndike  and  Lorge  count.   The  categories 
selected,  which  will  hereafter  be  referred  to  only  by  the  Roman  numeral 
designation  given  below,  were: 

I.   A  kind  of  cloth  (r  =  .988) 

II.   A  kitchen  utensil   (r  =  .987) 
III.   A  substance  for  flavoring  food   (r  =  .977) 

IV.   A  disease   (r  =  .906) 


LI 


The  correlations  shown  are  those  between  the  two  university  groups,  and 
are  based  on  the  rank  position  occupied  by  the  words  within  a  category 
based  on  their  frequency  of  mention. 

The  selection  of  specific  words  from  the  categories  was  made  by 
reference  to  the  Thorndike  and  Lorge  word  count  using  the  following 
guidelines,  which  could  be  applied  only  approximately.   Twelve  words 
were  chosen  from  each  category  to  make  a  test  list.   The  12  words  were 
further  divided  into  approximately  four  groups  with  at  least  a  5  to  10 
percent  difference  in  frequency  of  occurrence  between  each  group,  based 
on  the  Thorndike  and  Lorge  general  (G)  count.   Between  items  in  each 
group,  there  was  a  1  to  3  percent  difference  in  the  frequency  of  occur- 
rence.  Where  there  were  ties  in  the  general  count,  the  other  counts 
(T,  L,  and  S)  given  in  the  word  count  were  used  to  break  the  ties.   Thus, 
there  was  a  fairly  reliable  clustering  of  words  into  four  frequency 
ranges  within  each  list  and  a  less  reliable  ranking  wichin  the  frequency 
ranges.   The  lists  are  shown  in  Table  I.   Each  category  provided  an  in- 
dependent relication  for  validation. 

2.   Subjects 

Twenty  male  and  female  Naval  Postgraduate  School  students  ranging 
in  age  from  24  to  37  years  with  comparable  levels  of  education  served 
in  the  validation  experiment.   Each  subject  was  used  twice,  and  10  sub- 
jects were  assigned  at  random  to  each  of  the  four  categories. 
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3.   Procedure 

Each  word  list  was  reproduced  in  random  order  on  a  sheet  of 
paper.   The  subjects  were  told  to  make  an  ordinal  ranking  of  the  words 
as  to  what  they  believed  their  relative  familiarity  was  to  all  people 
in  general.   They  were  further  instructed  to  judge  only  those  objects 
which  they  could  rank  with  confidence,  make  use  of  as  many  ordinal 
ranks  as  they  deemed  necessary,  and  to  place  as  many  objects  in  each 
rank  as  they  desired.   By  way  of  guidance,  they  were  instructed  to 
select  the  number  of  ordinal  ranks  they  were  willing  to  use  first  and 
then  to  write  the  number  of  the  rank  beside  the  objects  they  chose  to 
rank.   They  were  also  advised  to  give  first  impressions  and  work 
rapidly. 

C.   RESULTS 

The  orderings  made  by  the  subjects  and  processed  by  the  Ford 
program  are  shown  in  Table  2,  along  with  the  Spearman  rank  correlation 
(rho)  between  the  Thorndike-Lorge  and  Ford  program  orderings.   The  re- 
sults will  be  examined  in  detail  only  for  category  I. 

Table  3  shows  the  win-loss  matrix  for  category  I.   The  rows  (i) 
are  arranged  in  the  sequence,  from  top  to  bottom,  according  to  their 
assigned  ID  numbers.   When  one  reads  across  the  table  horizontally,  he 
is  reading  the  number  of  times  the  row  item  was  preferred  to  any  column 
item  and  the  sum  in  the  rightmost  column  shows  how  many  times  the  row 
item  "won."  When  one  reads  down  the  columns  vertically,  he  is  reading 
the  number  of  times  the  column  item  lost  to  the  row  item,  and  the  sum 
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TABLE  1 
VALIDATION  TEST  LISTS  WITH  WORDS  PRESENTED  IN 
THORNDIKE-LORGE  RANK  ORDER  WITHIN  CATEGORIES 


CATEGORY  I 

1 . 

cotton 

2. 

felt 

3. 

wool 

4. 

lace 

5. 

velvet 

6. 

canvas 

7. 

muslin 

8. 

pique' 

9. 

rayon 

10. 

corduroy 

11. 

denim 

12. 

batiste 

CATEGORY  II 


1 . 

cup 

2. 

bowl 

3. 

knife 

4. 

fork 

5. 

refrigerator 

6. 

saucer 

7, 

sieve 

8. 

skillet 

9. 

ladle 

10. 

scraper 

11. 

toaster 

12. 

cleaver 

CATEGORY  III 


CATEGORY  IV 


1. 

salt 

1. 

sugar 

3. 

sage 

4. 

ginger 

5. 

vinegar 

6. 

cloves 

7. 

mustard 

8. 

cinnamon 

9. 

nutmeg 

10. 

thyme 

11. 

basil 

12. 

cayenne 

1. 

cold 

2. 

rheumatism 

3. 

typhoid 

4. 

cancer 

5. 

smallpox 

6. 

cholera 

7. 

measles 

8. 

rheumatic  fever 

9. 

syphilis 

10. 

diabetes 

11. 

dysentery 

12. 

peritonitis 
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TABLE  2 
FORD  PROGRAM  RANK  ORDERING  OF  WORDS  WITHIN  CATEGORIES 


(Criterion  rank  numbers  and  the  Pearson  rank  order 
correlation  between  the  computer  and  criterion 
rank  orders  are  shown.) 


CATEGORY  I  (rho  = 

.521) 

1. 

cotton 

3. 

wool 

4. 

lace 

6. 

canvas 

10. 

corduroy 

5. 

velvet 

11. 

denim 

9. 

rayon 

2. 

felt 

8. 

pique ' 

7. 

muslin 

12. 

batiste 

CATEGORY  II  (rho  =  .598) 


4. 

fork 

6. 

saucer 

3. 

knife 

1. 
2. 

cup 
bowl 

8. 

skillet 

11. 

toaster 

5. 

refrigerator 

12. 

cleaver 

9. 

ladle 

10. 

scraper 

7. 

sieve 

CATEGORY  III  (rho  =  .687) 


CATEGORY  IV  (rho  =  .460) 


1. 

salt 

2. 

sugar 

7. 

mustard 

5. 

vinegar 

6. 

cloves 

8. 

cinnamon 

9. 

nutmeg 

4. 

ginger 

3. 

sage 

11. 

basil 

12. 

cayenne 

10. 

thyme 

1. 

cold 

4. 

cancer 

7. 

measles 

9. 

syphilis 

10. 

diabetes 

2. 

rheumatism 

5. 

smallpox 

6. 

cholera 

3. 

typhoid 

11. 

dysentery 

8. 

rheumatic  fever 

12. 

peritonitis 
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at  the  bottom  of  the  columns  show  the  frequency  of  losses.   Within  the 
matrix,  any  entry  shows  how  many  times  a  comparison  was  made  between 
the  two  items  involved.   For  example,  the  maximum  number  of  10  compari- 
sons was  only  made  between  cotton  and  denim  and  cotton  and  muslin.   In 

the  matrix  notation,  these  would  be   a.    and   a    .   The  win  percents 

4 ,  /       4,0 

which  would  be  used  as  the  initial  weights  in  equation  (1)  are  shown 
below  the  column  sums.   A  comparison  of  the  rankings  which  would  be 
made  on  the  basis  of  the  Thorndike-Lorge  Count,  the  Ford  program  scal- 
ing, and  the  win  percent  are  shown  in  Table  4.   A  graph  showing  how 
the  weights  change  per  iteration  is  presented  in  Figure  1. 

The  observed  rank  correlation  of  .521  between  the  Thorndike- 
Lorge  and  category  I  rankings  is  not  as  high  as  one  would  like.   An 
examination  of  the  rankings  showed  a  great  discrepancy  for  the  word, 
felt.   Two  good  reasons  can  be  given  for  this  discrepancy  with  the 
benefit  of  retrospect.   First,  it  was  found  that  "felt"  in  the  Thorndike- 
Lorge  count  includes  the  past  tense  of  "feel",  which  would  account  for 
its  high  position  in  the  word  count.   The  cloth,  felt,  is  included  also. 
The  subjects  were,  of  course,  ranking  the  latter  use  of  the  word.   Se- 
cond, the  Thorndike-Lorge  count  was  published  in  1944  and  the  use  of 
felt  has  diminished  greatly  since  then  so  that  it  is  not  as  familiar 
to  a  new  generation  of  persons.   Recomputation  of  the  correlation  for 
category  I  with  felt  removed  resulted  in  a  rho  of  .788. 

Similarly,  rho  of  .460  was  disappointing  for  category  IV  (Table 
2) .   Inspection  of  the  differences  in  rankings  showed  typhoid  and 
syphylis  occupying  diametrically  opposite  positions  in  the  two  rankings 
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TABLE  4 
COMPARATIVE  RANK  ORDERING  OF  CATEGORY  I  ITEMS 

Thorndike-Lorge  Ford  Program             Win  Percent 

1.  Cotton  1  1 

2.  Felt  9  9 

3.  Wool  2  2 

4 .  Lace  3  3 

5.  Velvet  6  6 

6.  Canvas  4  4 

7.  Muslin  11  10 

8.  Pique'  10  11 

9 .  Rayon  8  7 

10.  Corduroy  5  5 

11.  Denim  7  8 

12.  Batiste  12  12 
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(Table  1).   The  differences  could  again  be  accounted  for  by  changing 
trends  in  the  incidence  of  the  diseases  and  the  openness  with  which 
syphylis  is  mentioned  today  compared  with  1944.   Moreover,  the  mili- 
tary personnel  who  served  as  subjects  would  be  more  sensitive  to 
syphylis  as  a  disease  than  the  population  at  large  owing  to  the  em- 
phasis given  venereal  disease  prevention  in  the  military  services. 
With  the  differences  in  the  observed  ranks  halved  for  the  two  diseases, 
rho  for  category  IV  was  increased  to  .585.   With  these  two  changes, 
each  of  the  four  obtained  correlation  coefficients  was  found  to  be 
significantly  different  from  a  hypothesized  rho  of  zero  by  a  2-tailed, 
t    test  at  the  .05  level. 

Table  4  also  suggests  that  the  win  percent  calculated  from  the 
win-loss  matrix  is  closely  related  to  the  final  ordinal  rankings  of 
the  items.   In  consonance  with  this  observation,  it  was  found  that 
rank  order  stability  was  reached  after  the  first  iteration  for  cate- 
gories I,  III,  and  IV  and  after  the  third  iteration  for  category  II. 
Category  I  converged  in  35  iterations  and  category  III,  in  16.   No 
convergence  was  reached  for  categories  II  and  IV  after  50  iterations. 
Four  objects  in  category  III  were  rated  as  universal  highs  and  were 
removed  prior  to  computation  of  weights. 

D.   DISCUSSION  AND  SUMMARY 

To  recapitulate,  the  validation  procedure  used  20  individuals 

who  were  assigned  in  groups  of  10  to  four  tasks  requiring  them  to  make 

ordinal  judgments  that  were  made  purposefully  difficult.   The  results 


20 


showed  that  in  all  four  cases  the  judgments  made  by  the  group  were 
significantly  related  to  the  criterion,  that  ordinal  rankings  of  the 
judged  items  were  made  quickly  and  efficiently,  and  that  in  two  of  the 
four  tasks,  the  numerical  scaling  of  the  ranked  items  had  converged 
to  a  stable  position.   The  magnitude  of  the  corrected  correlation 
coefficients  showed  that  approximately  30  to  60  percent  of  the  total 
variance  was  accounted  for  in  the  correspondence  between  judgments 
and  the  criterion.   This  is  considered  excellent  in  view  of  the  many 
factors  that  operated  to  attenuate  the  correlation  coefficients.   First, 
as  mentioned  above,  the  criterion  was  based  on  old  information.   More- 
over, the  criterion  was  based  on  a  word  count  made  entirely  from  printed 
materials,  whereas  the  task  given  the  judges  implied  familiarity  of 
the  words  based  on  all  contexts.   Too,  the  Thorndike-Lorge  word  count 
used  all  meanings  of  the  words — e.g.,  ginger  as  a  seasoning  and  a  girl's 
name,  sage  as  a  seasoning  and  a  wise  man — whereas  their  familiarity  was 
judged  in  the  specific  category  specified.   Additionally,  the  crucial 
assumption  that  made  this  validation  possible — that  familiarity  with 
verbal  materials  is  related  to  their  frequency  of  occurrence  in  the 
language — is  in  itself  not  a  perfect  relationship.   Another  factor  that 
was  no  doubt  a  severe  constraint  on  the  magnitude  of  the  correlations 
was  the  way  the  words  were  chosen  for  the  lists.   That  is,  there  was 
a  very  minute  difference  in  the  frequency  count  of  some  words  within 
their  selection  bands.   In  fact,  two  words  in  one  of  the  middle  bands 
and  all  four  words  in  the  bottom  band  of  the  category  I  list  were  tied 
in  frequency  in  the  Thorndike  and  Lorge  general  count.   This  was  done 
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to  ensure  that  there  would  be  a  large  number  of  ties  in  the  rankings 
of  the  judges  in  order  to  make  a  thorough  test  of  the  Ford  program. 
Considering  the  total  impact  of  these  attenuating  factors,  the  obtained 
correlation  coefficients  are  very  high  and  provide  strong  evidence  for 
the  efficiency  of  the  Ford  ranking  procedure  and  the  Pelz  and  Andrews 
computer  program  as  adapted  for  the  Naval  Postgraduate  School's  IBM 
360/67  system. 

IV.   TRIAL  APPLICATION  OF  THE  FORD  PROGRAM 

A.   PROBLEM  SELECTION 

It  has  been  shown  that  the  Ford  program  is  effective  in  taking 
ratings  of  judges  with  respect  to  an  abstract,  qualitative  dimension 
and  scaling  them.   The  next  and  final  step  in  this  project  is  to  deter- 
mine whether  the  procedures  can  be  efficiently  and  effectively  applied 
to  a  practical  problem.   If  the  former  test  can  be  considered  a  vali- 
dation of  the  program,  the  next  step  could  be  called  a  trial  applica- 
tion of  the  program. 

It  would  be  desirable  to  have  the  trial  application  duplicate 
in  detail  a  planned  or  proposed  actual  use  of  the  Ford  program.   Now, 
it  was  emphasized  in  the  previous  report  (Arima,  1971)  that  proper 
project  selection  was  a  crucial  component  of  successful  laboratory 
management.   Dr.  Donald  F.  Hornig,  then  director  of  the  Office  of 
Science  and  Technology  in  the  Executive  Office  of  the  President,  was 
quoted  as  saying  that  one  of  the  most  critical  questions  in  the  effec- 
tive utilization  of  Federal  laboratories  was  ''The  choice  of  problems, 
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their  significance,  and  the  feasibility  of  finding  solutions  through 
research  and  development  .  .  .  ."  (Subcommittee,  1968;  p.  9).   One  way 
to  improve  project  selection  might  be  to  examine  current  projects  for 
their  significance  using  representatives  of  sponsor  and  using  agencies, 
and  to  examine  the  feasibility  of  finding  solutions  through  research 
and  development  by  having  in-house  scientific/ technical  personnel  eval- 
uate current  projects  from  this  standpoint.   This  line  of  reasoning  led 
the  trial  application  of  the  Ford  program  to  the  problem  of  evaluating 
the  significance  of  current  programs. 

B.   METHOD 

1.   Stimulus  Materials. 

As  part  of  the  review  of  in-house  laboratories  being  conducted 
by  the  Director  of  Defense  Research  and  Engineering,  the  Director  of 
Navy  Laboratories  by  letter  dated  25  March  1971  requested  various  ac- 
tivities within  the  Navy  to  document  significant  contributions  and 
accomplishments  by  their  in-house  laboratories.   Using  the  material 
prepared  in  response  to  this  request  by  the  Personnel  Research  Divi- 
sion, Bureau  of  Naval  Personnel,  for  the  Navy's  personnel  research 
laboratories,  10  programs  were  selected  at  random  as  items  to  be  rated 
for  this  trial  application  of  the  Ford  program.   The  project  descrip- 
tions given  in  the  report  were  edited  and  condensed,  in  some  cases, 
and  appear  in  Appendix  IV.   A  listing  of  the  programs  chosen  is  shown 
below.   The  numbers  and/or  the  short  title  (in  parentheses)  given  in 
the  listing  will  hereafter  be  used  to  reference  and  identify  the  pro- 
grams.  The  programs  were: 
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(1)  Improved  Enlisted  Personnel  Distribution  and  Management 
(Personnel  Distribution) 

(2)  Ship  Manning  Requirements  Techniques  (Manning  Requirements) 

(3)  Evaluation  of  Standards  for  Navy  Reenlistment  (Reenlistment 
Standards) 

(A)   Development  of  Navy  Military  Personnel  Costing  Techniques 
for  Use  in  Determining  Cost  Implications  Associated  with  Changes  in 
Reenlistment  Rates  (Reenlistment  Costing) 

(5)  Design  of  an  Optimum  Personnel  Force  Structure  (Personnel 
Structure) 

(6)  Interest  Measurement  in  Officer  Selection  (Officer  Selection) 

(7)  Evaluation  Survey  of  the  Effectiveness  of  Submarine  Sonar 
Operator  Training  (Sonar  Training) 

(8)  Marginal  Personnel/Minority  Group  Testing  (Personnel  Testing) 

(9)  Personnel  Cost  Research  for  Early  Man/Machine  Design  Trade- 
offs (Man-Machine  Costs) 

(10)   LOFARGRAM  Analysis  Procedures  (LOFARGRAM  Analysis) 

2.  Subjects 

The  subjects  were  10  Navy  officer  students  attending  the  Naval 
Postgraduate  School. 

3.  Procedure 

The  method  was  essentially  identical  to  the  validation  procedures. 
Each  subject  was  given  a  copy  of  the  research  programs  (Appendix  IV)  and 
instructed  to  make  an  ordinal  ranking  of  the  items  with  respect  to  their 
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desirability  and  need  for  retention  and  further  development  as  research 
programs  within  the  Navy.   As  before,  they  were  told  to  rank  only  those 
items  which  they  could  with  confidence,  to  use  as  many  ranks  as  they 
deemed  necessary,  and  to  place  as  many  programs  as  they  desired  in  any 
ranking  category.   They  were  advised  to  review  the  programs  first  and 
then  decide  on  the  number  of  ranking  categories  to  use.   Having  done 
this,  they  wrote  the  number  of  the  rank  chosen  beside  the  program  des- 
cription.  Cards  were  keypunched  from  these  data  and  run  through  the 
Ford  program. 

C.   RESULTS 

The  rankings  given  the  10  programs  by  the  10  judges  are  shown  in 
Table  5.   The  smallest  number  of  programs  ranked  was  four  by  judge  number 
six.   Another  judge  ranked  8  items,  and  the  other  eight  judges  ranked 
all  programs.   Of  the  latter,  five  judges  used  three  categories;  one  used 
four;  another  five;  and  another,  10  categories.   The  number  of  comparisons 
made  by  each  judge  is  shown  in  Table  6  for  a  total  of  312  comparisons. 

The  win-loss  matrix  is  shown  in  Table  7  with  sums  of  wins   (a. .) 
and  losses   (a..)   in  the  right  and  bottom  margins,  respectively.   There 
were  no  universal  highs  or  lows.   Only  14  iterations  were  required  to 
achieve  stable  weights  at  the  .005  criterion.   The  program  used  7.55  sees, 
of  central  processor  unit  time.   Table  8  shows  a  summary  of  the  results. 
The  items  are  listed  in  the  ordinal  order  of  final  ranks  and  show  the  num- 
ber of  comparisons  in  which  each  item  was  involved  (sums  of  wins  and 
losses),  the  win  percent,  and  the  final  weights. 
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TABLE  5 
RANKINGS  OF  TRIAL  APPLICATION  PROJECTS  BY  INDIVIDUAL  JUDGES 

Proj  ect 

1.  Personnel  Distribution 

2.  Manning  Requirements 

3.  Reenlistment  Standards 

4.  Reenlistment  Costing 

5.  Personnel  Structure 

6.  Officer  Selection 

7.  Sonar  Training 

8.  Personnel  Testing 

9.  Man-machine  Costs 
10.  LOFARGRAM  Analysis 


Jud 

ges 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

2 

1 

1 

1 
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1 

1 
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7 
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I 

1 

1 

5 

1 

2 

2 
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3 

1 

2 

3 

2 

2 

1 
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3 
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3 

2 

4 

3 
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2 

10 

2 

3 

2 

2 

3 

3 

2 

4 

9 

2 

2 

3 

1 

1 

4 

3 

4 

8 
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TABLE  6 

NUMBER  OF  COMPARISONS  MADE  BY  EACH  JUDGE  IN 
THE  TRIAL  APPLICATION  TEST 

Judge  Number  Number  of  Comparisons 

1  31 

2  32 

3  33 

4  28 

5  32 

6  4 

7  35 

8  33 

9  39 
10  45 

TOTAL  312 
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TABLE  8 
SUMMARY  RESULTS  OF  THE  TRIAL  APPLICATION  TEST 

Number  of  Final 

Project  Comparisons   Win  Percent  Weights 

1.  Personnel  Distribution 

2.  Manning  Requirements 

3.  Reenlistment  Standards 

4.  Reenlistment  Costing 

5.  Personnel  Structure 

6.  Officer  Selection 

7.  Sonar  Training 

8.  Personnel  Testing 

9.  Man-machine  Costs 
10.  LOFARGRAM  Analysis 


66 

80.3 

1.606 

60 

68.3 

.868 

62 

64.4 

.702 

63 

60.2 

.620 

60 

60.0 

.612 

66 

47.0 

.338 

59 

35.4 

.217 

66 

28.8 

.175 

64 

29.7 

.174 

58 

29.3 

.171 
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D.   DISCUSSION  AND  SUMMARY 

While  the  number  of  judges  and  the  number  of  alternatives  eval- 
uated were  small,  consistent  trends  were  evident.   Except  for  one  judge 
who  only  contributed  four  comparisons,  all  the  other  judges  contributed 
from  28  to  45  comparisons,  showing  that  any  judge  makes  a  significant 
contribution  to  the  total  number  of  judgments,  even  if  he  does  not  rank 
all  items  and  uses  few  rank  categories.   Similarly,  in  spite  of  the 
freedom  permitted  the  judges  in  choosing  items  to  rate  and  the  number 
of  rating  categories,  the  entries  in  Table  8  show  that  all  items  entered 
into  a  fairly  uniform  number  of  comparisons  with  a  range  from  58  to  66. 
Obviously,  both  of  these  distributions  will  depend  on  the  sample  of 
judges  and  the  types  and  number  of  alternatives  to  be  judged,  but  it  is 
apparent  from  this  trial  that  there  will  be  a  central  tendency  in  the 
number  of  categories  judges  will  choose  to  use  and  the  number  of  altern- 
atives a  judge  will  adjudicate.   Similarly,  the  alternatives  will  tend 
to  attract  a  fairly  uniform  number  of  comparisons  over  a  number  of 
judges.   Moreover,  when  the  choices  are  difficult,  there  will  probably 
not  be  any  universal  highs  or  lows,  thanks  to  those  who  bet  the  long 
shots  and  the  other  who  will  give  the  lowest  underdog  a  boost.   The 
most  important  finding,  however,  was  that  the  weights  stabilized  rapidly, 
indicating  that  a  group  of  judges  can  achieve  reasonable  consensus  in 
their  composite  judgment.   Finally,  the  efficiency  of  the  system  was 
revealed  by  the  very  short  computer  time  required  for  the  scaling. 

Five  of  the  rated  programs  could  be  identified  in  the  work  plans 
of  the  two  laboratories  with  some  degree  of  certitude.   From  these 
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descriptions,  the  five  were  ranked  according  to  FY1971  expenditures 
for  each  program,  and  a  Pearson  rank  correlation  coefficient  was  cal- 
culated with  the  ranks  of  the  programs  based  on  their  weights  obtained 
from  the  10  judges.   The  obtained  rho  was  .60,  which  suggests  that 
there  is  a  relationship  between  the  amounts  being  invested  in  these 
research  projects  and  the  combined  judgments  of  Naval  officers  who 
are  representative  of  user  elements  of  the  Navy.   This  trend  lends 
credence  to  the  suggestion  presented  above,  that  the  Ford  program 
might  well  be  used  to  analyze  project  selection  based  on  the  relation- 
ship between  funding  and  user  ratings,  professional  estimates  of  feas- 
ibility of  finding  solutions  through  research,  and  the  resources  ac- 
tually being  programmed  for  the  projects. 

V.   ADDITIONAL  APPLICATIONS 

A.   SITUATION 

Concurrent  with  this  study,  an  investigation  was  being  made  into 
the  relative  values  of  the  major  segments  of  the  Naval  Postgraduate 
School's  operations  research  courses  as  seen  by  the  student.   One  group 
of  54  graduating  students  in  the  operations  analysis  curriculum  and 
another  group  of  15  graduating  students  in  various  management  curricula 
had  been  asked  to  rank  nine  program  segments  in  the  operations  research 
list  of  courses.   The  data  lay  unanalyzed  because  of  the  many  ties  (which 
were  permitted)  and  because  students  had  ranked  different  numbers  of  the 
program  segments.   (They  could  not  rank  courses  they  had  not  taken.) 
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B.  RESULTS 

The  data  were  in  a  form  that  would  be  obtained  in  an  application 
of  the  Ford  procedure.   Accordingly,  they  were  run  through  the  Ford  pro- 
gram with  a  convergence  criterion  of  .005.   The  criterion  was  reached 
in  14  iterations  for  the  54  operations  analysis  students  and  in  24  iter- 
ations for  the  management  students.   The  result  was  a  useful  scaling  of 
the  items  for  the  purposes  that  had  motivated  their  collection. 

C.  COMMENTS 

This  application  in  a  genuine  research  setting  shows  the  utility 
of  the  Ford  program.   It  confirms  statements  made  above  in  the  discus- 
sion of  the  trial  application  test  that  a  consensus — in  the  form  of  weight 
convergence — is  rapidly  reached  when  knowledgeable  judges  rate  clearly 
defined,  real-world  alternatives.   One  must  conclude  that  the  Ford  pro- 
gram could  be  used  to  good  advantage  in  the  many,  ever  increasing,  dif- 
ficult, decision  situations  which  are  currently  arising  in  which  value 
judgments  made  by  individuals  are  the  major  sources  of  data.   It  should 
be  noted,  too,  that  the  data  had  been  collected  in  a  manner  that  was 
identical  to  an  application  of  the  Ford  procedure.   In  this  case,  how- 
ever, circumstances  dictated  that  they  be  collected  in  this  fashion. 
That  is,  the  investigators  felt  that,  to  get  a  valid  sampling  of  opinions, 
the  individual  judge  had  to  be  permitted  to  use  the  number  of  rating 
categories  he  desired  (effectively  accomplished  by  permitting  multiple 
ties)  and  to  refrain  from  adjudicating  those  items  with  which  he  was 
not  familiar.   That  these  elements  should  be  characteristic  of  a  good 
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scheme  for  collecting  qualitative  judgments  was  mentioned  in  the  in- 
troductory portions  of  this  study. 

VI.   SUMMARY  AND  CONCLUSIONS 

The  purpose  of  this  study  was  to  develop,  validate,  and  test 
the  feasibility  of  a  procedure  for  obtaining  qualitative  judgments 
from  individuals  to  be  used  in  evaluating  the  effectiveness  and  opera- 
tions of  the  Navy's  in-house,  personnel  research  laboratories.   The 
Ford  procedure  for  scaling  partially  ordered  rankings,  as  programmed 
by  Pelz  and  Andrews,  was  further  programmed  for  the  Naval  Postgraduate 
School's  IBM  360/67  system.   The  procedures  and  program  were  validated 
using  an  arbitrary,  abstract  task  for  which  there  was  an  extrinsic 
criterion  and  tested  for  feasibility  in  research  evaluation  using  des- 
criptions of  actual  program  projects.   In  both  cases,  the  results  were 
highly  satisfactory. 

It  can  be  concluded  that  the  Ford  procedure  and  present  program 
can  be  used  to  obtain  qualitative  judgments  with  accuracy  and  efficiency. 
The  utility  of  the  program  is  limited  only  by  the  imagination  and  crea- 
tivity of  the  user  in  devising  appropriate  rating  schemes  for  his  pur- 
pose.  It  should  be  a  very  useful  tool  for  the  many  researchers  who  today 
are  faced  with  analyzing  "quality  of  life"  variables  for  which  conven- 
tional measurements  do  not  exist. 
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APPENDIX   I 
FLOW  CHART  OF  THE  FORD  PROGRAM 
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{"initial  weighting  factor  determin 

J  from  win-loss  matrix,  i.e., 

[percentage  of  win-loss. 


w 


I  =  1,N 


I 


J  =  L,N 


Adds  small  constant  to  all  cells  of  win- 
loss  matrix.   Satisfies  partitioning 
assumption  of  FORD  in  his  procedure. 


(1,1)  =  A(I,J)  +  .00001 


[PI  =  I  +  1 


=  t  P 1 ,  N 


_L 


Computes  the  number  of  times  objects 
i   and   j   are  ranked  relative  to  each 
other .    a    +  a  . 


)    =    A  (I, J)    +   A  (J,  I) 


A  (J,  I)    =    A(I,J) 


REWIND 
9 


c 


.ETURN 


3 
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© 


M  =  0 


M  =  DENOM  + 


A(I,J) 


X(I)  +  X(J) 


I 


/•,,.,.       r  th    . 

I  weighting  factor  on  n    iteration 
<  a  .  .  +  a  .  . 

v.        t   n      n 


Y(I)  = 


W(l) 
DENOM 


10  =  JO  +  1 

HZ 


-{ 
--{ 


i  w  .   +  w . 
i     J 

New  weighting  factor  for  object   i 
for  iteration  being  considered. 


^       \    Iteration  counter 


I  =  1  ,N 


X(I)  =  Y(I) 


WRITE   JO,  KO, 
Y(I)   I  =  1,N 


-{ 


J  Counts  the  number  of  weights  which 
\\   change  more  than  convergence  criteri 

[Sets  weight  fact 
Xweight  for  objec 


or  equal  to  new  computed 
ct  in  question. 


Writes  intermediate  weights  of  each  object 
giving  iteration  count  and  a  count  of  how 
many  changed  more  than  convergence  criterion, 


J  Compares  iteration  count  against  pre-set  number 
jof  iterations  to  terminate  w/out  convergence. 


WRITE   MAN (J),  LIST(MAN(J)) 


Y(I) 


I    RETURN     ') 


7<-{ 


Writes  final  weights  of  objects  giving 
assigned  and  original  ID  numbers. 
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APPENDIX   II 
DATA  ASSEMBLY  FOR  INPUT  TO  THE  FORD  PROGRAM 
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DATA  CARD  SET  UP 

A.  LABEL  CARD  -  Type  "1"  in  col.  one,  then  any  71H.   (This  will  be 

out  by  machine) . 

B.  PARAMETER  CARD  -  All  numbers  right  adjusted.   Omit  all  leading 

zeros . 
Col.  1-6  -  Total  //  of  objects  being  compared  by  all  judges  ^  130 
Col.  7-12  -  //  of  judges  £   130 

Col.  13-18  -  Convergence  criterion  (.005  presently  used) 
Col.  19-24  -  Max  #  of  iterations 

C_.    JUDGE  CARD  -  Right  adjusted.   Omit  leading  zeros. 
Col.  1-6  -  //  of  ranks  used  by  judge  ^  130 

D.    DATA  CARD  -  Right  adjusted.   Use  leading  zeros. 

Col.  1--3  -  //  of  objects  placed  in  this  rank  by  judge. 

Col.  4-6  -  ID  //  of  object  (original  ID  #) 
7-9  - 


70-72 
Continue  with  as  many  cards  as  necessary  to  fill  out  rank.   Sub- 
sequent cards  begin  ID  //  Col.  1-3. 

Repeat  C^.  and   D_.   for  each  judge. 


CARD     ASSEMBLY 


/* 


(ORANGE) 


n± 


DATA     CARDS 


//GO.SYSIN/s  DDA* 


//GO,  FTP 9  FQQ1A DD^UN IT=SYSDA,DISP=(NEW DELETER 


SPACE  =(CYL,1) 


PROGRAM 


^ 


/'FORT.SYSINUDD/v  * 


//AEXECA  FORTCLG,    REGION  ,  GO  =  I  SO  K 


//NAME,   etc 


(GREEN) 
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THE  FORD  PROGRAM 
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PROJECT  DESCRIPTIONS  FOR  TRIAL  APPLICATION  TESTING 

TITLE :   Improved  Enlisted  Personnel  Distribution  and 
Management. 

DESCRIPTION :   A  computer  assisted  distribution  and  assign- 
ment (CADA)  system  is  being  designed  to  help  improve  the 
utilization  of  enlisted  manpower.   Preliminary  model  cur- 
rently is  being  implemented  in  the  Pacific  Fleet.   Proto- 
type model  is  now  under  development  for  application  in 
BUPERS  in  support  of  centralized  management  of  enlisted 
ratings.   Related  research  results  include  development  of 
computer  and  mathematically  based  procedures  for  (1)  the 
equitable  allocation  of  personnel  resources,  (2)  the 
optimal  match  of  man  and  billet,  (3)  the  identification  of 
billet  vacancies  in  order  of  priority,  (4)  the  projection 
of  the  number  of  distributable  assets,  and  (5)  the  feed- 
back of  information  on  the  results  of  distribution 
management  actions . 

TITLE :   Ship  Manning  Requirements  Techniques 
DESCRIPTION:   The  increasing  sophistification  and  com- 
plexity of  naval  ships,  systems,  and  equipments  in  the 
face  of  project  volunteer  and  a  smaller  Navy  requires 
the  development  of  methods  which  will  improve  the  accuracy 
of  manpower  requirements  forecasting  and  manpower 
utilization . 
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A  technique  for  defining  and  documenting  manpower 
requirements  for  ships  based  on  the  application  of  se- 
lected work  study  techniques  to  basic  manning  criteria  in 
each  of  the  separate  work  areas  aboard  ship  has  been 
developed.   It  permits  the  production  of  a  document  which 
displays  in  detail  the  rationale  for  manning  by  ship 
classes  based  on  equipment  and  required  operational  capa- 
bilities to  meet  mission  assignment. 

TITLE:   Evaluation  of  Standards  for  Navy  Reenlistment. 
DESCRIPTION:   This  research  was  generated  out  of  concern 
over  the  quality  of  reenlistees.   Unsatisfactory  perform- 
ance was  costing  the  military  services  enormous  amounts 
of  money  in  such  things  as  reenlistment  bonuses  and  pay 
and  allowances  for  reenlistees  from  whom  commensurate 
service  was  not  realized.   Court  and  confinement  costs  of 
reenlistees  were  cited.   It  was  suspected  that  personnel 
of  inferior  quality  were  being  allowed  to  reenlist,  in- 
cluding some  with  unsatisfactory  first  term  records. 

In  an  attempt  to  identify  unsatisfactory  individuals 
prior  to  reenlistment,  comparisons  were  made  between  un- 
satisfactory and  satisfactory  reenlistees  on  information 
available  at  the  time  of  the  reenlistment  decision.  The 
project  also  provided  information  on  the  effect  on  manning 
which  would  result  if  reenlistment  standards  were  made 
more  stringent. 
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TITLE:   Development  of  Navy  Military  Personnel  Costing 
Techniques  for  Use  in  Determining  Cost  Implica- 
tions Associated  with  Changes  in  Reenlistment 
Rates . 

DESCRIPTION:   Thousands  of  skilled  technicians  are  re- 
quired to  operate  and  maintain  the  complex  systems  and 
equipment  now  in  the  Fleet.   The  Navy  constantly  experi- 
ences difficulty  in  retaining  these  technicians  because 
of  competition  for  them  from  other  sectors  of  the 
economy . 

To  alleviate  this  problem,  several  technician-oriented 
procurement  programs  and  career  incentive  programs  are 
employed.   To  facilitate  evaluation  of  these  programs,  a 
methodology  for  determining  the  relative  cost  benefits 
associated  with  retention  of  personnel  has  been  developed. 

TITLE :   Design  of  an  Optimum  Personnel  Force  Structure. 
DESCRIPTION :   An  optimum  force  structure  containing  ap- 
propriately qualified  personnel  in  sufficient  numbers  at 
least  cost  cannot  now  be  certified.   This  project  is  con- 
cerned with  the  development  of  improved  techniques  to 
analyze  and  balance  the  relationship  between  personnel 
requirements  and  the  composition  of  the  existing  force 
structure . 

TITLE:   Interest  Measurement  in  Officer  Selection. 
DESCRIPTION:   Each  year  several  thousand  young  men  apply 
for  officer  training  programs  at  the  Naval  Academy  and 
NROTC  units  at  various  colleges.   High  attrition  rates 
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are  experienced  in  both  training  and  active  duty.   To 
reduce  the  cost  of  losing  substantial  proportions  of 
these  men,  it  is  imperative  that  those  applicants  having 
the  greatest  career  potential  be  identified  in  the  selec- 
tion process.   Several  years  of  research  on  vocational 
interest  tests  and  biographical  questionnaires  have  made 
it  possible  to  identify  those  applicants  most  likely  to 
successfully  complete  officer  training  and  remain  in  the 
Navy  after  completing  their  minimum  requirements. 

7.  TITLE:   Evaluation  Survey  of  the  Effectiveness  of  Sub- 

marine Sonar  Operator  Training. 

DESCRIPTION :   A  comprehensive  survey  was  accomplished  of 
the  proficiency,  training,  and  utilization  of  submarine 
sonar  technicians  and  sonar  watchstanders .   The  survey 
provided  up-to-date  information  concerning  the  efficiency 
of  training  procedures.   Such  information  is  necessary  on 
a  periodic  basis  to  insure  appropriate  alignment  of  the 
training  to  fleet  requirements  in  order  to  prevent  seri- 
ous impairment  of  operational  fleet  submarine  ASW 
efficiency.   Data  gathering  instruments  included  interview 
forms,  self  ratings,  supervisor  ratings,  knowledge  tests, 
and  performance  tests. 

8.  TITLE:   Marginal  Personnel/Minority  Group  Testing. 
DESCRIPTION:   Present  test  batteries  used  in  both  military 
and  civilian  settings  have  been  criticized  for  alleged 
inequities  when  used  with  groups  defined  on  the  basis  of 
race  or  ethnic  affiliation.   Public  policy  as  well  as 
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efficient  manpower  utilization  requires  that  all  personnel 
be  afforded  equality  of  opportunity  in  assignment  and  that 
those  abilities  being  measured  bear  relevance  to  skills 
required  on-the-job. 

9.  TITLE:   Personnel  Cost  Research  for  Early  Man/Machine 
Design  Trade-Offs. 

DESCRIPTION:   The  critical  element  of  personnel  cost  has 
not  been  systematically  considered  when  making  system 
design  and  development  decisions  early  in  the  system  de- 
velopment cycle.   No  tools  exist  to  enable  the  cost- 
effectiveness  of  such  decisions  to  be  measured.   For  this 
reason,  research  was  undertaken  to  develop  a  personnel 
cost  model  for  use  in  personnel  and  man-equipment  trade  off 
decisions.   A  basis  model  was  accomplished  which  allowed 
the  identification  of  all  pertinent  cost  items  and  the 
accumulation  of  cost  elements  in  an  unequivocal  manner. 

10.  TITLE:   LOFARGRAM  Analysis  Procedures. 

DESCRIPTION:   The  airborn  JEZEBEL  system  has  shown  great 
potential  as  a  means  of  detecting  and  classifying  under- 
water contacts;  however,  its  usefulness  has  been  continu- 
ally hampered  by  the  lack  of  adequately  trained  operators. 
One  of  the  main  reasons  for  operator  deficiencies  is  that 
training  programs  have  been  seriously  hampered  by  the  lack 
of  a  standardized,  systemic  procedure  for  analyzing  the 
information  displayed  on  the  gram  which  is  the  main  display 
component  of  the  system. 
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In  order  to  correct  this  situation,  a  systematic 
LOFARGRAM  procedure  was  developed. 
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